outlier detection
- North America > United States > Indiana > Monroe County > Bloomington (0.05)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Wisconsin (0.04)
- (4 more...)
- Oceania > Australia > Queensland (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.46)
- Health & Medicine > Therapeutic Area (0.46)
- Asia > Singapore (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Information Technology (1.00)
- Law Enforcement & Public Safety > Fraud (0.46)
- Information Technology > Software (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Social Media (1.00)
- (4 more...)
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Arizona (0.04)
- Asia > Singapore (0.04)
- Information Technology (0.93)
- Law Enforcement & Public Safety > Fraud (0.46)
Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials
Lam, Terry C. W., O'Neill, Niamh, Schran, Christoph, Schaaf, Lars L.
The accuracy of machine learning interatomic potentials suffers from reference data that contains numerical noise. Often originating from unconverged or inconsistent electronic-structure calculations, this noise is challenging to identify. Existing mitigation strategies such as manual filtering or iterative refinement of outliers, require either substantial expert effort or multiple expensive retraining cycles, making them difficult to scale to large datasets. Here, we introduce an on-the-fly outlier detection scheme that automatically down-weights noisy samples, without requiring additional reference calculations. By tracking the loss distribution via an exponential moving average, this unsupervised method identifies outliers throughout a single training run. We show that this approach prevents overfitting and matches the performance of iterative refinement baselines with significantly reduced overhead. The method's effectiveness is demonstrated by recovering accurate physical observables for liquid water from unconverged reference data, including diffusion coefficients. Furthermore, we validate its scalability by training a foundation model for organic chemistry on the SPICE dataset, where it reduces energy errors by a factor of three. This framework provides a simple, automated solution for training robust models on imperfect datasets across dataset sizes.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Africa > Comoros > Grande Comore > Moroni (0.04)
Further Analysis of Outlier Detection with Deep Generative Models Ziyu Wang 1,2
The recent, counter-intuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and high-density region may not conincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihood-based outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of low-level texture versus high-level semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed.
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Spain (0.04)
- Europe > Greece > Crete > Chania (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)